Automated classification of A/E/C web content

نویسندگان

  • R. Amor
  • K. Xu
چکیده

In this paper the adaptation of a standard information retrieval technique, namely latent semantic indexing, is examined for a domain specific search engine. The premise behind this approach is that it is possible to accurately identify classification codes related to the content of the web page or web site. If content can be accurately classified then a user searching for content in a particular area (e.g. by specifying a classification code) will be presented only with highly relevant web information. The reason that we attempt to classify to a standard classification code is that these are used and understood by the vast majority of professionals within the A/E/C industries. Because a classification code has a well described scope it is likely to be understood similarly by professionals from many disciplines. Therefore, a system that can accurately retrieve information associated with a classification code is one which can be tied to many processes within the A/E/C profession where information is associated with these codes. This paper describes the ongoing development of the LSI-based search engine. It concentrates particularly on the testing of the resultant search engine in terms of the precision of the classification of construction industry web pages to a construction industry classification system (by comparison with an expert’s determination of correct classifications). It also provides an analysis of the developed search engine’s search result accuracy by comparison with the results returned by other major search engines in current use (e.g. Google and Yahoo) on the same query formulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image flip CAPTCHA

The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Practical Issues for Automated Categorization of Web Sites

In this paper we discuss several issues related to automated text classification of web sites. We analyze the nature of web content and metadata and requirements for text features. We present an approach for targeted spidering including metadata extraction and opportunistic crawling of specific semantic hyperlinks. We describe a system for automatically classifying web sites into industry categ...

متن کامل

On the Automated Classification of Web Sites

In this paper we discuss several issues related to automated text classification of web sites. We analyze the nature of web content and metadata in relation to requirements for text features. We find that HTML metatags are a good source of text features, but are not in wide use despite their role in search engine rankings. We present an approach for targeted spidering including metadata extract...

متن کامل

Automated classification of pulmonary nodules through a retrospective analysis of conventional CT and two-phase PET images in patients undergoing biopsy

Objective(s): Positron emission tomography/computed tomography (PET/CT) examination is commonly used for the evaluation of pulmonary nodules since it provides both anatomical and functional information. However, given the dependence of this evaluation on physician’s subjective judgment, the results could be variable. The purpose of this study was to develop an automated scheme for the classific...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005